Multi-modal pedestrian detection with misalignment based on modal-wise regression and multi-modal IoU

نویسندگان

چکیده

Multi-modal pedestrian detection, which integrates visible and thermal sensors, has been developed to overcome many limitations of visible-modal such as poor illumination, cluttered background, occlusion. By adopting the combination multiple modalities, we can efficiently detect pedestrians even with visibility. Nevertheless, critical assumption multi-modal detection is that images are perfectly aligned. In general, however, this often becomes invalid in real-world situations. Viewpoints different modal sensors usually different. Then, positions on have disparities. We proposed a faster-RCNN specifically designed handle misalignment between two modalities. The consists region proposal network (RPN) detector. introduce position regressors for both modalities RPN Intersection over union (IoU) one useful metrics object but defined only single-modal image. extend it into IoU evaluate preciseness Our experimental results evaluation demonstrate method comparable performance state-of-the-art methods outperforms them data significant misalignment.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Damage detection of multi-girder bridge superstructure based on the modal strain approaches

The research described in this paper focuses on the application of modal strain techniques on a multi-girder bridge superstructure with the objectives of identifying the presence of damage and detecting false damage diagnosis for such structures. The case study is a one-third scale model of a slab-on-girder composite bridge superstructure, comprised of a steel-free concrete deck with FRP rebars...

متن کامل

Multi-modal human aggression detection

This paper presents a smart surveillance system named CASSANDRA, aimed at detecting instances of aggressive human behavior in public environments. A distinguishing aspect of CASSANDRA is the exploitation of complementary audio and video cues to disambiguate scene activity in real-life environments. From the video side, the system uses overlapping cameras to track persons in 3D and to extract fe...

متن کامل

Music Emotion Regression based on Multi-modal Features1

Music emotion regression is considered more appropriate than classification for music emotion retrieval, since it resolves some of the ambiguities of emotion classes. In this paper, we propose an AdaBoost-based approach for music emotion regression, in which emotion is represented in PAD model and multi-modal features are employed, including audio, MIDI and lyric features. We first demonstrate ...

متن کامل

Efficient codes for multi-modal pose regression

Redundancy reduction, or sparsity, appears to be an important information-theoretic principle for encoding natural sensory data. While sparse codes have been the subject of much recent research, they have primarily been evaluated using readily available datasets of natural images and sounds. In comparison, relatively little work has investigated the use of sparse codes for representing informat...

متن کامل

MULTI-MODAL UTILE DISTINCTIONS Multi-Modal Utile Distinctions

We introduce Multi-Modal Utility Trees (MMU), an algorithm for autonomously learning decision treebased state abstractions in Partially Observable Markov Decision Processes with multi-modal observations. MMU builds the trees using the Kolmogorov-Smirnov statistical test. Additionally, MMU incorporates the ability to perform online tree restructuring, enabling it to build and maintain a compact ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Journal of Electronic Imaging

سال: 2023

ISSN: ['1017-9909', '1560-229X']

DOI: https://doi.org/10.1117/1.jei.32.1.013025